Abstract

Rising inequality and increased privatization of space in urban landscapes is bringing attention to some of the only public spaces left: libraries. This study analyzes to what extent library service areas differ along lines of inequality like race, class, etc. This study will delineate library catchment areas in Chicago, IL and compare them with socio-economic data at the tract and block level. This analysis is the first part of a two pronged methods that aims to answer the question of to what extent the catchment areas are distinct.

Study Metadata

  • Key words: public space, libraries, population weighted aggregation, service areas, demographics
  • Subject: Social and Behavioral Sciences: Geography: Human Geography
  • Date created: 11/28/2023
  • Date modified: 12/5/2023
  • Spatial Coverage: Chicago, IL
  • Spatial Resolution: Census Tracts, Census Blocks, Library Service Areas
  • Spatial Reference System: EPSG:32616
  • Temporal Coverage: 2017-Present
  • Temporal Resolution: Specify the temporal resolution of your study—i.e. the duration of time for which each observation represents or the revisit period for repeated observations

Study design

This study is a reproduction of my own an original study. As part of my independent research work with Professor Peter Nelson, I created a workflow in QGIS to answer the question: How do library service catchment areas differ along lines of race, class, gender, etc? In order to streamline this research and make it reproducible/replicable I decided to reproduce the workflow in R and create a research compendium for it as part of my final independent project in GEOG0361: Open GIScience.

This research aims to answer the following two questions. How do library service catchment areas differ along lines of race, class, gender, etc. How do the public services in these catchment areas reflect the nature of their local constituents?

Materials and procedure

Computational environment

# record all the packages you are using here
# this includes any calls to library(), require(),
# and double colons such as here::i_am()
packages <- c( 
  "tidycensus", "tidyverse", "sf", "classInt", "readr", "tigris",
  "rgdal","rstudioapi", "here", "s2", "pastecs", "tmap", "knitr", 
  "kableExtra", "broom", "leaflet", "usethis", "deldir", "spatstat"
)

# force all conflicts to become errors
# if you load dplyr and use filter(), R has to guess whether you mean dplyr::filter() or stats::filter()
# the conflicted package forces you to be explicit about this
# disable at your own peril
# https://conflicted.r-lib.org/
require(conflicted)
## Loading required package: conflicted
# load and install required packages
# https://groundhogr.com/
if (!require(groundhog)) {
  install.packages("groundhog")
  require(groundhog)
}
## Loading required package: groundhog
## Attached: 'Groundhog' (Version: 3.1.2)
## Tips and troubleshooting: https://groundhogR.com
if(!require(here)){
  install.packages("here")
  require(here)
}
## Loading required package: here
## here() starts at C:/Users/azalecki/Documents/GitHub/Zalecki-2023
# this date will be used to determine the versions of R and your packages
# it is best practice to keep R and its packages up to date
groundhog.day <- "2023-06-26"
set.groundhog.folder("../../data/scratch/groundhog/")
## The groundhog folder already was '../../data/scratch/groundhog/'
# this replaces any library() or require() calls
groundhog.library(packages, groundhog.day)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.2     âś” readr     2.1.4
## âś” forcats   1.0.0     âś” stringr   1.5.0
## âś” ggplot2   3.4.2     âś” tibble    3.2.1
## âś” lubridate 1.9.2     âś” tidyr     1.3.0
## âś” purrr     1.0.1
## Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
## Loading required package: sp
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
##      (status 2 uses the sf package in place of rgdal)
## Please note that rgdal will be retired during October 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
## See https://r-spatial.org/r/2023/05/15/evolution4.html and https://github.com/r-spatial/evolution
## rgdal: version: 1.6-7, (SVN revision 1203)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.6.2, released 2023/01/02
## Path to GDAL shared files: C:/Users/azalecki/AppData/Local/R/win-library/4.3/rgdal/gdal
##  GDAL does not use iconv for recoding strings.
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 9.2.0, March 1st, 2023, [PJ_VERSION: 920]
## Path to PROJ shared files: C:/Users/azalecki/AppData/Local/R/win-library/4.3/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.6-1
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
## 
## Attaching package: 'pastecs'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## The following object is masked from 'package:tidyr':
## 
##     extract
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## deldir 1.0-9      Nickname: "Partial Distinction"
## 
##      The syntax of deldir() has changed since version 
##      0.0-10.  In particular the "dummy points" facility 
##      (which was a historical artifact) has been removed. 
##      In the current version, 1.0-8, an argument "id" has 
##      been added to deldir().  This new argument permits the 
##      user to specifier identifiers for points.  The default 
##      behaviour is to continue using the indices of the 
##      points to identify them.  In view of the fact that 
##      point identifiers may be user-supplied, the arguement 
##      "number", in plot.deldir() and plot.tile.list(), has 
##      had its name changed to "labelPts", and the argument 
##      "nex" in plot.deldir() has had its name changed to 
##      "lex".  In addition the name of the forth component 
##      of the "cmpnt_col" argument in plot.deldir() has been 
##      changed from "num" to "labels".  There is a new 
##      function getNbrs(), and the function tileInfo() has 
##      been modified to include output from getNbrs(). 
##      Please consult the help.
## Loading required package: spatstat.data
## Loading required package: spatstat.geom
## spatstat.geom 3.2-1
## Loading required package: spatstat.random
## spatstat.random 3.1-5
## Loading required package: spatstat.explore
## Loading required package: nlme
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
## 
##     collapse
## spatstat.explore 3.2-1
## Loading required package: spatstat.model
## Loading required package: rpart
## spatstat.model 3.2-4
## Loading required package: spatstat.linnet
## spatstat.linnet 3.1-1
## 
## spatstat 3.0-6 
## For an introduction to spatstat, type 'beginner'
## Successfully attached 'tidycensus_1.4.1'
## Successfully attached 'tidyverse_2.0.0'
## Successfully attached 'sf_1.0-13'
## Successfully attached 'classInt_0.4-9'
## Successfully attached 'readr_2.1.4'
## Successfully attached 'tigris_2.0.3'
## Successfully attached 'rgdal_1.6-7'
## Successfully attached 'rstudioapi_0.14'
## Previously attached  'here_1.0.1'
## Successfully attached 's2_1.1.4'
## Successfully attached 'pastecs_1.3.21'
## Successfully attached 'tmap_3.3-3'
## Successfully attached 'knitr_1.43'
## Successfully attached 'kableExtra_1.3.4'
## Successfully attached 'broom_1.0.5'
## Successfully attached 'leaflet_2.1.2'
## Successfully attached 'usethis_2.2.1'
## Successfully attached 'deldir_1.0-9'
## Successfully attached 'spatstat_3.0-6'
# you may need to install a correct version of R
# you may need to respond OK in the console to permit groundhog to install packages
# you may need to restart R and rerun this code to load installed packages
# In RStudio, restart r with Session -> Restart Session

# record the R processing environment
# alternatively, use devtools::session_info() for better results
writeLines(
  capture.output(sessionInfo()),
  here("procedure", "environment", paste0("r-environment-", Sys.Date(), ".txt"))
)

# save package citations
knitr::write_bib(c(packages, "base"), file = here("software.bib"))

# set up default knitr parameters
# https://yihui.org/knitr/options/
knitr::opts_chunk$set(
  echo = FALSE, # Run code, show outputs (don't show code)
  fig.retina = 4,
  fig.width = 8,
  fig.path = paste0(here("results", "figures"), "/")
)

#set up Github repository as the R project
#use_github("azalecki/Zalecki-2023")

Data and variables

Each of the next subsections describes one data source. Secondary data sources for the study are to include the following:

Chicago Shapefile

## Retrieving data for the year 2021
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that

American Community Survey(ACS) Demographic Data

I will add a more comprehensive list of variables as my senior research project progresses but in this code I will be working with one table: Household Income. I initially attempted to create a data table with variables from several tables but due to my very rudimentary skills in R, I was not successful in doing so. In order to keep moving forward with the code I had to simplify the project a little bit and work with only one of the data tables.

## To install your API key for use in future sessions, run this function with `install = TRUE`.
## Getting data from the 2017-2021 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Loading ACS5 variables for 2021 from table B19001 and caching the dataset for faster future access.

Public Library Locations

Data for Chicago Public Library locations comes in CSV format with coordinate data. Prior to uploading the CSV file into the Github site I used Microsoft Excel to manually seperate the Longitude and Latitude values into two separate columns. No other data manipulation was done in Excel.

## Rows: 81 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): NAME, HOURS OF OPERATION, ADDRESS, CITY, STATE, PHONE, WEBSITE
## dbl (8): ZIP, Latitude, Longitude, Boundaries - ZIP Codes, Community Areas, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"
##  [1] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [13] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [25] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [37] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [49] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [61] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [73] POINT POINT POINT POINT POINT POINT POINT POINT POINT
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

Population Data and Census Blocks for Cook County, IL

Because, the ACS data tables do not come with population data I have to bring in population data seperately. My intentions for this project include a population weighted aggregation so I will be using smaller block level data to more accurately estimate population distribution.

## Getting data from the 2020 decennial Census
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Loading PL variables for 2020 from table P1 and caching the dataset for faster future access.
## Using the PL 94-171 Redistricting Data Summary File
## Using the PL 94-171 Redistricting Data Summary File
## Note: 2020 decennial Census data use differential privacy, a technique that
## introduces errors into data to preserve respondent confidentiality.
## ℹ Small counts should be interpreted with caution.
## ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
## This message is displayed once per session.

Prior observations

Chicago Shapefile

  • data is not available yet
  • data is available, but only metadata has been observed
  • metadata and descriptive statistics have been observed
  • metadata and a pilot test subset or sample of the full dataset have been observed
  • the full dataset has been observed. Explain how authors have already manipulated / explored the data.

American Community Survey(ACS) Demographic Data

  • data is not available yet
  • data is available, but only metadata has been observed
  • metadata and descriptive statistics have been observed
  • metadata and a pilot test subset or sample of the full dataset have been observed
  • the full dataset has been observed. Explain how authors have already manipulated / explored the data.

Public Library Locations

  • data is not available yet
  • data is available, but only metadata has been observed
  • metadata and descriptive statistics have been observed
  • metadata and a pilot test subset or sample of the full dataset have been observed
  • the full dataset has been observed. Explain how authors have already manipulated / explored the data.

Population Data and Census Blocks for Cook County, IL

  • data is not available yet
  • data is available, but only metadata has been observed
  • metadata and descriptive statistics have been observed
  • metadata and a pilot test subset or sample of the full dataset have been observed
  • the full dataset has been observed. Explain how authors have already manipulated / explored the data.

Bias and threats to validity

Edge/shape effects when creating polygons to represent library service/catchment areas

Visualizing catchment areas for libraries is my first objective because, unlike primary schools that have definite attendance boundaries, libraries do not have proper “service areas.” In the past, Thiessen/Voronoi polygons have been used to map catchment or service areas by proximity to points. As explained by Flitter et al(nd), GIS tools that generate Thiessen polygons draw shapes around a layer of point data where every location within one shape is nearer to its center point than all other points in the layer. These proximal regions assume that people are more likely to visit the library closest to them and as a result library services should reflect their local constituents. I recognize that this method has its flaws because this is not always the case. Some people may frequent libraries outside of their residential neighborhood for a variety of reasons and there is no way of accurately tracking that. The other option would be to draw buffers around library points like in the method we saw in the Kang et al. (year) study or calculate a network analysis. Thiessen polygons are, however, the simpler and computationally less intense option to a full-on network analysis. Although they might seem arbitrary I have attempted to improve the validity by including a population-weighted aggregation to more accurately estimate the neighborhood characteristics of the library service areas.

Data transformations

ACS data transformations

The ACS classifies the data it collects in its own way but I wanted to reclassify it into bins to serve my purposes. I created bins/ simpler classifications for the Household Income ACS data.

Variable Name in Study Study Label Variable Used from ACS Data ACS Label
hhi1 under 25k B19001_002E Less than $10,000
B19001_003E $10,000 to $14,999
B19001_004E $15,000 to $19,999
B19001_005E $20,000 to $24,999
:———————-: :————-: :————————————————–: :———————:
hhi2 25k - 49.9k B19001_006E $25,000 to $29,999
B19001_007E $30,000 to $34,999
B19001_008E $35,000 to $39,999
B19001_009E $40,000 to $44,999
B19001_010E $45,000 to $49,999
:———————-: :————-: :————————————————–: :———————:
hhi3 50k - 74.9k B19001_011E $50,000 to $59,999
B19001_012E $60,000 to $74,999
:———————-: :————-: :————————————————–: :———————:
hhi4 75k - 99.9k B19001_013E $75,000 to $99,999
:———————-: :————-: :————————————————–: :———————:
hhi5 100k - 149.9k B19001_014E $100,000 to $124,999
B19001_015E $125,000 to $149,999
:———————-: :————-: :————————————————–: :———————:
hhi6 150k - 199.9k B19001_016E $150,000 to $199,999
:———————-: :————-: :————————————————–: :———————:
hhi7 over 200k B19001_017E $200,000 or more

Code for other tables that I will be working with later.

I took one of the ACS tables and selected for the necessary geographic identifiers (STATEFP, COUNTYFP, TRACTCE, GEOID, NAME.X, ALAND, AWATER, geometry) and the source fields I had created in the last step. For all of the following tables I just selected for the source fields I had created because I will be doing a spatial join and selecting for the geographic fields would be redundant.

I clipped the final table by the Chicago geometry as to only include tracts that are within Chicago’s city boundaries.

Library Catchment Areas

To create the catchment areas I will create Thiessen/Voronoi polygons from the library points.

Filter Population Blocks

Join Population Data and Population Weighted Re-Aggregation

The code up to this point, I am confident in is correct.

This is the point in the workflow where I start having issues. My intentions were to generate a layer of centroids for the block data so I could join the population data to the tracts. I would do this two times. Once for the original tracts layer and a second time after I intersect the tracts with the voronoi diagram. When I join the centroids to the tracts layer, R makes a row for every centroids that goes into a tract. Theoretically if I group by the tract ID: TRACTCE and summarize the population column I should get a sum population value for every tract. However, when I grouped by and summarized, I got around 200 less observations. The map reveals that many tracts aren’t receving the population data for some reason.

## Warning: st_centroid assumes attributes are constant over geometries

Results

Work in Progress.

Discussion

Work in Progress.

Integrity Statement

The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)

References

Flitter, H., Weckenbrock, P., & Weibel, R. (n.d.). Thiessen Polygon. Retrieved December 16, 2023, from http://www.gitta.info/Accessibilit/en/html/UncProxAnaly_learningObject4.html